Avoiding AI Deception: Lie Detectors can either Induce Honesty or Evasion
They call this scalable oversight but they both train and evaluate the lie detector probes labeled DolusChat examples. I don’t get why they call it scalable.Claude’s analysishttps://claude.ai/share/d659b385-f625-4b86-9eb5-f8ce1fea33e5
They call this scalable oversight but they both train and evaluate the lie detector probes labeled DolusChat examples. I don’t get why they call it scalable.
Claude’s analysis
https://claude.ai/share/d659b385-f625-4b86-9eb5-f8ce1fea33e5